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2 Cellular Disco: resource management using virtual clusters on shared-memory Q 
multiprocessors 

Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, Mendel Rosenblum 
December 1999 ACM SIGOPS Operating Systems Review , Proceedings of the 
seventeenth ACM symposium on Operating systems principles. 
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Full text available: 1 Slpdf(1.93 MB) Additional Information: full citation, abstract, references, dtings, 
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Despite the fact that large-scale shared-memory multiprocessors have been 
commercially available for several years, system software that fully utilizes all their 
features is still not available, mostly due to the complexity and cost of making the 
required changes to the operating system. A recently proposed approach, called 
Disco, substantially reduces this development cost by using a virtual machine 
monitor that leverages the existing operating system technology. In this paper we 
present a syste ... 

3 The network architecture of the Connection Machine CM-5 (extended abstract) Q 
Charles E. Leiserson, Zahi S. Abuhamdeh, David C. Douglas, Carl R. Feynman, Mahesh 

N. Ganmukhi, Jeffrey V. Hill, Daniel Hillis, Bradley C. Kuszmaul, Margaret A. St. Pierre, 
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22nd annual international symposium on Computer architecture, 

Volume 23 Issue 2 

Full text available: - ^pdfd.SS MB) Additional Information: full citation, abstract, references, citings, 

index terms 



S-Connect is a new high speed, scalable interconnect system that has been 
developed to sopomLt networks of workstations to efficlen^^hare computing 
resources. It use^Pf-the-shelf CMOS technology to direc^^lrive fiber-optic 
systems at speeds greater than 1 Gbit/sec and can realize bisection bandwidths 
comparable to high-end MPP systems while being >10x more cost-effective. 
S-Connect systems do not rely on centralized switches, but rather are composed of 
adaptive, topology independen ... 

5 Experience Using Multiprocessor Systems— A Status Report 

Anita K. Jones, Peter Schwarz 

June 1980 ACM Computing Surveys (CSUR), Volume 12 Issue 2 
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Gerhard W. Geitz, Ernst J. Schnnltter 

May 1981 Proceedings of the 8th annual symposium on Computer Architecture 

Full text available: Q pdf(483.41 Additional Information: full citation , abstract , references , index 
KB) terms 

The paper considers possibilities of distributed architecture to improve the reliability 
of microcomputer systems to realize a fault-tolerant system. By using and 
extending existing redundancies of hardware, software, and time, a partially 
meshed ring structure that meets the requirements of a fault-tolerant architecture 
has been designed. Aspects of hardware implementation, system software 
structure, operating system requirements, fault diagnosis, and reconfiguration are 
explained, based o ... 

7 Requirements and the concept of cooperative system management 
Bharat Bhushan, Ahmed Patel 

May 1998 InternationalJournal of Network Management, Volume 8 Issue 3 

Full text available: " godftl 67.03 Additional Information: full citation , abstract , references , citings . 
KB) index terms 

Cooperation among various types of management functions is necessary to allow 
management functions to interwork In providing and using information and services 
for systems management. To understand these tasks from the point of view of 
cooperative working, this article discusses the requirements and presents the 
concept of cooperative system management. © 1998 John Wiley & Sons, Ltd. 

8 Papers: YESSIR: a simple reservation mechanism for the Internet 
Ping Pan, Henning Schulzrinne 

April 1999 ACM SIGCOMM Computer Communication Review, Volume 29 Issue 2 
Full text available: ^pdf(1.23 MB) Additional Information: full citation , abstract , references , citings 

RSVP has been designed to support resource reservation in the Internet. However, 
it has two major problems: complexity and scalability. The former results in large 
message processing overhead at end systems and routers, and inefficient firewall 
processing at the edge of the network. The latter implies that in a backbone 
environment, the amount of bandwidth consumed by refresh messages and the 
storage space that is needed to support a large number of flows at a router are too 
large. We have devel ... 

9 Cellular disco: resource management using virtual clusters on shared-memory 
multiprocessors 

Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, Mendel Rosenblum 
August 2000 ACM Transactions on Computer Systems (TOCS), Volume 18 Issue 3 
Full text available: Q pdf(287.05 Additional Information: full citation , abstract , references , citings . 
KB) index terms , review 



Despite the fact that large-scale shared-memory multiprocessors have been 
commercially av^able for several years, system software Jj^t fully utilizes all their 
features is still nH^^afable, mostly due to the complexit^^Pd cost of making the 
required changes to the operating system. A recently proposed approach, called 
Disco, substantially reduces this development cost by using a virtual machine 
monitor that leverages the existing operating system technology. In this paper we 
present a ... 

Keywords: fault containment, resource managment, scalable multiprocessors, 
virtual machines 



10 Internet Nuggets 
Mark Thorson 

March 1996 ACM SIGARCH Computer Architecture News, Volume 24 Issue 1 

Full text available: 1 pdf(393.52 Additional Information: full citation , abstract 

KB) 

This column consists of selected traffic from the comp.arch newsgroup, a forum for 
discussion of computer architecture on Internet — an international computer 
network. 

11 Knowledge based fault management for OSI networks 
Celia A. Joseph, A. Sherzer, K. I^luralidhar 

June 1990 Proceedings of the third international conference on Industrial and 
engineering applications of artificial intelligence and expert systems 
- Volume 1 

Full text available: pdf(826.21 Additional Information: full citation , abstract , references , index 
KB) terms 

The OSI Fault {Management systenn (OSIFaM) is an evolving knowledge-based 
system for fault management of Open System Interconnection (OSI) networl<s. Our 
goal is to develop a knowledge-based tool that will reduce the expertise needed to 
recognize, diagnose and correct faults in OSI networks. For our first 
implementation, we are focusing on MAP 3.0 networks. This paper provides an 
overview of fault management in general, a brief survey of other fault management 
developments, the characteristics ... 

12 Architecture of the IBM svstem/370 
Richard P. Case, Andris Padegs 

January 1978 Communications of the ACM, Volume 21 Issue 1 

Full text available: H i pdf(2.78 MB) Additional Infonmation: full citation , abstract , references , citings . 
^ index terms 

This paper discusses the design considerations for the architectural extensions that 
distinguish System/370 from System/360. It comments on some experiences with 
the original objectives for System/360 and on the efforts to achieve them, and it 
describes the reasons and objectives for extending the architecture. It covers 
virtual storage, program control, data-manipulation instructions, timing facilities, 
multiprocessing, debugging and monitoring, error handling, and input/output 
operations. ... 

Keywords: architecture, computer systems, error handling, instruction sets, 
virtual storage 



13 Selective interpretation as a technique for debugging computationally intensive 
programs 

B. B. Chase, R. T. Hood 

July 1987 ACM SIGPLAN Notices , Papers of the Symposium on Interpreters 
and interpretive tecliniques, Volume 22 Issue 7 

Full text available: ■^ pdf(772.68 Additional Information: full citation , abstract , index terms 
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As part of Rice University's project to build a programming environment for 
scientific softwaroM^e have built a facility for program ex^^ion that solves some 
of tlie problems nBirent in debugging large, computation^^ intensive programs. 
By their very nature such programs do not lend themselves to full-scale 
Interpretation. In moderation however, interpretation can be extremely useful 
during the debugging process. In addition to discussing the particular benefits that 
we expect from interpre ... 

14 Linux Print System at Cisco Systems, Inc. 
Damian Ivereigh 
October 1998 Linux Journal 

Full text available: 1 html(40.10 Additional Information: full citation , abstract , index terms 
KB) 

Cisco runs a redundant system of 50 print servers using Linux, Samba and 
Netatalk. It prints to approximately 1,600 printers worldwide, serving 10,000 UNIX 
and Windows 95 users, some of whom are in mission-critical environments 




15 Designing SoCs for yield improvement: Using embedded FPGAs for SoC yield Q 
improvement 

Miron Abramovici, Charles Stroud, Marty Emmert 

June 2002 Proceedings of the 39th conference on Design automation 

Full text available: ^ pdf(200.31 Additional Information: full citation , abstract , references , citings , 
KB) index terms 

In this paper we show that an embedded FPGA core Is an ideal host to implement 
infrastructure IP for yield Improvement In a bus-based SoC. We present methods 
for testing, diagnosing, and repairing embedded FPGAs, for which complete 
testability is achieved without any area overhead or performance degradation. We 
show how an FPGA core can provide embedded testers for other cores in the SoC, 
so that cores designed to be tested with external vectors can be tested with BIST, 
and the entire SoC can be ... 



16 Networks: A network-failure-tolerant message-passing system forterascale 
clusters 

Richard L. Graham, Sung-Eun Choi, David J. Daniel, Nehal N. Desai, Ronald G. 
MInnich, Craig E. Rasmussen, L. Dean RIsinger, Mitchel W. Sukalski 
June 2002 Proceedings of the 16tli international conference on 
Supercomputing 

Full text available: ' ^pdfd 48.66 Additional Information: full citation , abstract , references , citings . 
KB) index terms 

The Los Alamos Message Passing Interface (l-A-MPI) is an end-to-end 
networl<-failure-tolerant message-passing system designed for terascale clusters. 
LA-MPI is a standard-compliant implementation of MPI designed to tolerate 
network-related failures including I/O bus errors, network card errors, and 
wire-transmission errors. This paper details the distinguishing features of LA-MPI, 
including support for concurrent use of multiple types of network Interface, and 
reliable message transmission utilizi ... 



Keywords: MPI, fault tolerance, message passing 



17 Improving the dependability of network management systems 
Elias Procopio Duarte, Glenn Mansfield, Takashi Nanya, Shoichi Noguchi 
July 1998 International Journal of Network Management, Volume 8 Issue 4 

Full text available; ^pdf(147.51 Additional Information: full citation , abstract , references , index 
KB) terms 

As computer networks expand, there Is a pressing need for management systems 
capable of handling errors. This article proposes an approach based on 
management proxies to improve the dependability of fault management systems. 
An effective MIB to implement the proxies is presented, which allows their 




deployment at virtually no cost. As an example, a case study of a WAN Is carried 
out. © 1998 Johj^iley & Sons, Ltd. 
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C. W. Hemming, S. A. Szygenda 

August 1972 Proceedings of the ACM annual conference - Volume 1 

Full text available: ^ pdf(882.95 Additional Information: full citation , abstract , references , citings 
KB) ^ 

Simulation of digital logic provides a viable technique for development and 
diagnosis of digital systems. Simulation models currently employed are discussed 
with a summary of structure and timing techniques. A methodology for functional 
simulation In conjunction with gate level simulation is discussed, presenting a 
representative set of predefined functions, and introducing a measure for 
predefined function performance. Errors in design detectable at the functional level 
are catagorized. 

Keywords: diagnosis of digital systems, digital simulation, fault simulation, 
functional simulation, logic design 



19 Network performance reporting | 
K. Terplan 

April 1982 Proceedings of the Computer Network Performance Symposium 

Full text available: ^ pdf(655.20 Additional Information: full citation , abstract , references , index 
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Managing networks using Network Administration Centers is increasingly 
considered. After Introducing the Information demand for operational, tactical and 
strategic network management the paper is dealing with the investigation of the 
applicability of tools and techniques for these areas. Network monitors and software 
problem determination tools are investigated In greater detail. Also Implementation 
details for a multihost-multlnode network including software and hardware tools 
combined by ... 

20 Slipstream processors: improving both performance and fault tolerance | 
Karthik Sundaramoorthy, Zach Purser, Eric Rotenburg 

November 2000 Proceedings of the ninth international conference on 
Architectural support for programming languages and 
operating systems, Volume 28 , 34 Issue 5 , 5 

Full text available: ^pdf(111.54 Additional Infonnation: full citation , abstract , references , citings . 
KB) index terms 

Processors execute the full dynamic instruction stream to arrive at the final output 
of a program, yet there exist shorter instruction streams that produce the same 
overall effect. We propose creating a shorter but otherwise equivalent version of 
the original program by removing ineffectual computation and computation related 
to highly-predictable control flow. The shortened program is run concurrently with 
the full program on a chip multiprocessor simultaneous multithreaded processor, 
with two ... 

Results 1 - 20 of 69 Result page: 1 2 3 4 next 

The ACM Portal is published by the Association for Computing Machinery. Copyright © 2004 ACM, Inc. 
Terms of Usage Privacy Policv Code of Ethics Contact Us 

Useful downloads: ^ Adobe Acrobat a QuickTime B Windows Media Player Real Player 



US Patent & Trademark Office 



Subscribe (Full Service) Register (Limited Service, Free) Login 



Search: ® The ACM d|fcl Library O The Guide 
l+networkj-bypass ring fault error +dia gn Qse -f restore 



Terms used 

network bypass ring loop fault error diagnose restore 



Sort 

results by 

Display 
results 



relevance 



expanded form 



^ Save results to a Binder 

1^ Search Tips 

|J Open results in a new 
window 



I Feedback Report a 
problem Satisfaction survey 

Found 17 of 139.988 

Try an Advanced Search 

Try this search In The ACM Guide 



Results 1 - 17 of 17 

Relevance scale □ Q H B ■ 

1 Cellular Disco: resource management using virtual clusters on shared-memory Q 
multiprocessors 

Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, Mendel Rosenblum 
December 1999 ACM SIGOPS Operating Systems Review , Proceedings of the 
seventeenth! ACM symposium on Operating systems principles, 

Volume 33 Issue 5 

Additional Information: full citation , abstract , references , citings . 
Index terms 



Full text available: ' gpdf(1.93 MB) 



Despite the fact that large-scale shared-memory multiprocessors have been 
commercially available for several years, system software that fully utilizes all their 
features Is still not available, mostly due to the complexity and cost of making the 
required changes to the operating system. A recently proposed approach, called 
Disco, substantially reduces this development cost by using a virtual machine 
monitor that leverages the existing operating system technology. In this paper we 
present a syste ... 



Experience Using Multiprocessor Systems— A Status Report 
Anita K. Jones, Peter Schwarz 

June 1980 ACM Computing Surveys (CSUR), Volume 12 Issue 2 
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3 Internet Nuggets 
Mark Thorson 

March 1996 ACM SIGARCH Computer Architecture News, Volume 24 Issue 1 

Full text available: ^ pdf(393.52 Additional Infomiation: full citation , abstract 
KB) 

This column consists of selected traffic from the comp.arch newsgroup, a forum for 
discussion of computer architecture on Internet— an international computer 
network. 

4 Cellular disco: resource management using virtual clusters on shared-memory Q 
multiprocessors 

Kinshuk Govil, Dan Teodosiu, Yongqiang Huang, Mendel Rosenblum 

August 2000 ACM Transactions on Computer Systems (TOCS), Volume 18 Issue 3 

Full text available: ^ pdf(287.05 Additional Infomiation: full citation , abstract , references , citings . 
KB) index terms , review 




Despite the fact that large-scale shared-memory multiprocessors have been 
commercially av^Wple for several years, system softwar^^t fully utilizes all their 
features is still nUivailable, mostly due to the complexit^B^d cost of making the 
required changes to the operating system. A recently proposed approach, called 
Disco, substantially reduces this development cost by using a virtual machine 
monitor that laverages the existing operating system technology. In this paper we 
present a ... 

Keywords: fault containment, resource managment, scalable multiprocessors, 
virtual machines 



5 Architecture of the IBM svstem/370 
Richard P. Case, Andris Padegs 

January 1978 Communications of the ACM, Volume 21 Issue 1 

Full text available: ^ pdf(2.78 MB) Additional Information: full citation , abstract , references , citings . 
^ index terms 

This paper discusses the design considerations for the architectural extensions that 
distinguish System/370 from System/360. It comments on some experiences with 
the original objectives for System/360 and on the efforts to achieve them, and it 
describes the reasons and objectives for extending the architecture. It covers 
virtual storage, program control, data-manipulation instructions, timing facilities, 
multiprocessing, debugging and monitoring, error handling, and input/output 
operations. ... 

Keywords: architecture, computer systems, error handling, instruction sets, 
virtual storage 
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Celia A. Joseph, A. Sherzer, K. Muralidhar 

June 1990 Proceedings of the third international conference on Industrial and 
engineering applications of artificial intelligence and expert systems 
- Volume 1 

Full text available: ^ pdf(826.21 Additional Information: full citation , abstract , references , index 
KB) terms 

The OSI Fault Management system (OSIFal^) is an evolving knowledge-based 
system for fault management of Open System Interconnection (OSI) networks. Our 
goal is to develop a knowledge-based tool that will reduce the expertise needed to 
recognize, diagnose and correct faults in OSI networks. For our first 
implementation, we are focusing on MAP 3.0 networks. This paper provides an 
overview of fault management in general, a brief survey of other fault management 
developments, the characteristics ... 

Selective interpretation as a technique for debugging computationally intensive Q 
programs 

B. B. Chase, R. T. Hood 

July 1987 ACM SIGPLAN Notices , Papers of the Symposium on Interpreters 
and interpretive techniques. Volume 22 Issue 7 

Full text available: ■^ pdf(772.68 Additional Information: full citation , abstract , index terms 
KB) 

As part of Rice University's project to build a programming environment for 
scientific software, we have built a facility for program execution that solves some 
of the problems inherent in debugging large, computationally intensive programs. 
By their very nature such programs do not lend themselves to full-scale 
interpretation. In moderation however, interpretation can be extremely useful 
during the debugging process. In addition to discussing the particular benefits that 
we expect from Interpre ... 



8 Slipstream processors: improving both performance and fault tolerance 
Karthik Sundaramoo^^, Zach Purser, Eric Rotenburg 
November 2000 ProHldings of the ninth international cAKrence on 
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operating systems, Volume 28 , 34 Issue 5 , 5 
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Processors execute the full dynamic instruction streann to arrive at the final output 
of a progrann, yet there exist shorter instruction streams that produce the same 
overall effect. We propose creating a shorter but otherwise equivalent version of 
the original program by removing ineffectual computation and computation related 
to highly-predictable control flow. The shortened program is run concurrently with 
the full program on a chip multiprocessor simultaneous multithreaded processor, 
with two ... 



Slipstream processors: improving both performance and fault tolerance 
Karthik Sundaramoorthy, Zach Purser, Eric Rotenberg 
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Processors execute the full dynamic instruction stream to arrive at the final output 
of a program, yet there exist shorter instruction streams that produce the same 
overall effect. We propose creating a shorter but otherwise equivalent version of 
the original program by removing ineffectual computation and computation related 
to highly-predictable control flow. The shortened program is run concurrently with 
the full program on a chip multiprocessor or simultaneous multithreaded processor, 
with t ... 



10 Network performance reporting 
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April 1982 Proceedings of the Computer Network Performance Symposium 

Full text available: ^ pdf(655.20 Additional Infomiation: full citation , abstract , references , index 
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Managing networks using Network Administration Centers is increasingly 
considered. After introducing the information demand for operational, tactical and 
strategic network management the paper is dealing with the investigation of the 
applicability of tools and techniques for these areas. Network monitors and software 
problem determination tools are investigated in greater detail. Also implementation 
details for a multihost-multlnode network including software and hardware tools 
combined by ... 



11 BPF+: exploiting global data-flow optimization in a generalized packet filter 
architecture 

Andrew Begel, Steven McCanne, Susan L. Graham 

August 1999 ACM SIGCOMM Computer Communication Review , Proceedings of 
the conference on Applications, technologies, architectures, and 
protocols for computer communication. Volume 29 Issue 4 

Full text available- 1 ^pdf(1.55 MB) Additional Information: full citation , abstract, references , dWngs. 
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A packet filter is a programmable selection criterion for classifying or selecting 
packets from a packet stream in a generic, reusable fashion. Previous work on 
packet filters falls roughly into two categories, namely those efforts that investigate 
flexible and extensible filter abstractions but sacrifice performance, and those that 
focus on low-level, optimized filtering representations but sacrifice flexibility. 
Applications like network monitoring and intrusion detection, however, requ ... 



12 Microcode implemented General Modular Redundancy 



F. P. Mathur, P. T. de Sousa 



September 1974 Conference record of the 7th annual workshop on 
MioMrogramming 

Full text available: ^piW9.37 Additional Information: full citation . eWract . references , index 
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First the concepts of protective redundancy are described in the unified framework 
called General Modular Redundancy (GI^R). GMR Is a unified frannework which 
synthesizes ail the nnajor redundancy techniques known. An alternative to an 
exclusively hardware implementation is by means of an extension to the 
Wensleylan Software Implemented Fault-Tolerance (SIFT) approach. A more 
attractive alternative, an implementation in microcode, is proposed and described 
here. 

13 The <bigwig> project 

Claus Brabrand, Anders l^0ller, l^ichael I. Schwartzbach 

May 2002 ACM Transactions on Internet Technology (TOIT), Volume 2 Issue 2 

Full text available: ^ pdf(586>33 Additional Information: full citation , abstract , references , citings . 
KB) index terms 

We present the results of the <bigwig> project, which aims to design and 
implement a high-level domain-specific language for programming Interactive Web 
services. 

A fundamental aspect of the development of the World Wide Web during the last 
decade is the gradual change from static to dynamic generation of Web pages. 
Generating Web pages dynamically in dialog with the client has the advantage of 
providing up-to-date and tailor-made information. The development of systems ... 

Keywords: Interactive Web services, program analysis 
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C. Halatsis, A. van Dam, J. Joosten, M. Letheren 

May 1980 Proceedings of the 7th annual symposium on Computer Architecture 
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This paper describes architectural considerations which led to the design of a fast 
programmable processor made from ECL bit-slioes. The processor will be used as 
an on-line data filtering engine for high energy physics experiments. Unlike prior 
designs of such engines, the processor supports both user (horizontal) microcode 
and emulation of the PDP-11 fixed point instruction set (without memory 
management and multiple interrupt levels). In addition to an overview of the 
techniques used to ... 
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Partha Pal, Franklin Webber, Richard Schantz 

September 2001 Proceedings of the 2001 workshop on New security paradigms 

Full text available: ^ pdf(783.75 Additional Information: full citation , abstract , references , index 
KB) temis 

Attack survival, which means the ability to provide some level of service despite an 
ongoing attack by tolerating its impact, is an important objective of security 
research. In this paper we present a new approach to survivability and intrusion 
tolerance. Our approach, which we call "survival by defense" is based on the 
observation that many applications can be given increased resistance to malicious 
attack even though the environment In which they run is untrustworthy. This paper 
describes the ... 
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